最大熵的原理是一种广泛适用的技术,用于计算可能最少的信息的分布,同时约束以匹配经验估计的特征期望。但是,在许多使用噪声传感器计算功能期望的现实世界中,由于对相关模型变量的部分观察,该功能期望可能具有挑战性。例如,执行学徒学习的机器人可能会因环境阻塞而忽视其正在学习的代理。我们表明,在将最大熵的原理概括为这些类型的情况时,我们不可避免地将对学习模型的依赖性引入了经验特征期望。我们介绍了不确定的最大熵的原理,并提出了从潜在最大熵原理中概括的基于期望最大化的解决方案。最后,我们在实验上证明了我们技术在最大因果熵逆增强学习域中提供的嘈杂数据的鲁棒性。
translated by 谷歌翻译
多个代理的分布式任务分配引发了基本和新的控制理论和机器人问题。新的挑战是开发分布式算法,它动态地将任务分配给多个代理,而不是依赖于先前的分配信息。这项工作提出了一种基于消息到期的验证方法的多机器人任务管理的分布式方法。我们的方法通过使用基于距离和时间戳的测量来处理分布式多机器人系统中的断开引起的冲突,以验证每个机器人的任务分配。机器人模拟器平台中的仿真实验已经验证了所提出的方法的有效性。
translated by 谷歌翻译
群机器人执行觅食任务的适用性受其紧凑的尺寸和成本的启发。需要相当大量的能量来执行这些任务,特别是如果任务是连续和/或重复的。现实世界的情况,其中机器人在保持活力(生存能力)时连续执行任务,并最大限度地提高生产(性能)需要能量意识。本文提出了一种能够有意识的分布式任务分配算法来解决连续任务(例如,无限觅食),用于合作机器人以实现高效的任务。当食物返回收集箱时,我们将效率视为机器人在勘探和收集期间消耗的能量的函数。最后,所提出的节能算法最小化了充电站的总传输时间和在充电时消耗的时间消耗,最大化机器人的寿命,以执行最大的任务,以提高协作机器人的整体效率。我们对典型的贪婪基准战略(将最近的收藏箱分配给可用机器人的最近的收集箱并最大充电)效率和性能在各种方案中的效率和性能。拟议的方法显着提高了基线方法的性能和效率。
translated by 谷歌翻译
Human pose estimation has been widely applied in various industries. While recent decades have witnessed the introduction of many advanced two-dimensional (2D) human pose estimation solutions, three-dimensional (3D) human pose estimation is still an active research field in computer vision. Generally speaking, 3D human pose estimation methods can be divided into two categories: single-stage and two-stage. In this paper, we focused on the 2D-to-3D lifting process in the two-stage methods and proposed a more advanced baseline model for 3D human pose estimation, based on the existing solutions. Our improvements include optimization of machine learning models and multiple parameters, as well as introduction of a weighted loss to the training model. Finally, we used the Human3.6M benchmark to test the final performance and it did produce satisfactory results.
translated by 谷歌翻译
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability, whereby it enables effective zero-shot cross-lingual transfer of syntactic knowledge. The transfer is more successful between some languages, but it is not well understood what leads to this variation and whether it fairly reflects difference between languages. In this work, we investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages. We demonstrate that the distance between the distributions of different languages is highly consistent with the syntactic difference in terms of linguistic formalisms. Such difference learnt via self-supervision plays a crucial role in the zero-shot transfer performance and can be predicted by variation in morphosyntactic properties between languages. These results suggest that mBERT properly encodes languages in a way consistent with linguistic diversity and provide insights into the mechanism of cross-lingual transfer.
translated by 谷歌翻译
Optimization in multi-task learning (MTL) is more challenging than single-task learning (STL), as the gradient from different tasks can be contradictory. When tasks are related, it can be beneficial to share some parameters among them (cooperation). However, some tasks require additional parameters with expertise in a specific type of data or discrimination (specialization). To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad'). This structure allows us to formalize cooperation and specialization as the process of matching experts and tasks. We optimize this matching process during the training of a single model. Specifically, we incorporate mixture of experts (MoE) layers into a transformer model, with a new loss that incorporates the mutual dependence between tasks and experts. As a result, only a small set of experts are activated for each task. This prevents the sharing of the entire backbone model between all tasks, which strengthens the model, especially when the training set size and the number of tasks scale up. More interestingly, for each task, we can extract the small set of experts as a standalone model that maintains the same performance as the large model. Extensive experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
translated by 谷歌翻译
In this work, we present a novel framework built to simplify 3D asset generation for amateur users. To enable interactive generation, our method supports a variety of input modalities that can be easily provided by a human, including images, text, partially observed shapes and combinations of these, further allowing to adjust the strength of each input. At the core of our approach is an encoder-decoder, compressing 3D shapes into a compact latent representation, upon which a diffusion model is learned. To enable a variety of multi-modal inputs, we employ task-specific encoders with dropout followed by a cross-attention mechanism. Due to its flexibility, our model naturally supports a variety of tasks, outperforming prior works on shape completion, image-based 3D reconstruction, and text-to-3D. Most interestingly, our model can combine all these tasks into one swiss-army-knife tool, enabling the user to perform shape generation using incomplete shapes, images, and textual descriptions at the same time, providing the relative weights for each input and facilitating interactivity. Despite our approach being shape-only, we further show an efficient method to texture the generated shape using large-scale text-to-image models.
translated by 谷歌翻译
Image instance segmentation is a fundamental research topic in autonomous driving, which is crucial for scene understanding and road safety. Advanced learning-based approaches often rely on the costly 2D mask annotations for training. In this paper, we present a more artful framework, LiDAR-guided Weakly Supervised Instance Segmentation (LWSIS), which leverages the off-the-shelf 3D data, i.e., Point Cloud, together with the 3D boxes, as natural weak supervisions for training the 2D image instance segmentation models. Our LWSIS not only exploits the complementary information in multimodal data during training, but also significantly reduces the annotation cost of the dense 2D masks. In detail, LWSIS consists of two crucial modules, Point Label Assignment (PLA) and Graph-based Consistency Regularization (GCR). The former module aims to automatically assign the 3D point cloud as 2D point-wise labels, while the latter further refines the predictions by enforcing geometry and appearance consistency of the multimodal data. Moreover, we conduct a secondary instance segmentation annotation on the nuScenes, named nuInsSeg, to encourage further research on multimodal perception tasks. Extensive experiments on the nuInsSeg, as well as the large-scale Waymo, show that LWSIS can substantially improve existing weakly supervised segmentation models by only involving 3D data during training. Additionally, LWSIS can also be incorporated into 3D object detectors like PointPainting to boost the 3D detection performance for free. The code and dataset are available at https://github.com/Serenos/LWSIS.
translated by 谷歌翻译
Effective data imputation demands rich latent ``structure" discovery capabilities from ``plain" tabular data. Recent advances in graph neural networks-based data imputation solutions show their strong structure learning potential by directly translating tabular data as bipartite graphs. However, due to a lack of relations between samples, those solutions treat all samples equally which is against one important observation: ``similar sample should give more information about missing values." This paper presents a novel Iterative graph Generation and Reconstruction framework for Missing data imputation(IGRM). Instead of treating all samples equally, we introduce the concept: ``friend networks" to represent different relations among samples. To generate an accurate friend network with missing data, an end-to-end friend network reconstruction solution is designed to allow for continuous friend network optimization during imputation learning. The representation of the optimized friend network, in turn, is used to further optimize the data imputation process with differentiated message passing. Experiment results on eight benchmark datasets show that IGRM yields 39.13% lower mean absolute error compared with nine baselines and 9.04% lower than the second-best.
translated by 谷歌翻译
Hashing has been widely researched to solve the large-scale approximate nearest neighbor search problem owing to its time and storage superiority. In recent years, a number of online hashing methods have emerged, which can update the hash functions to adapt to the new stream data and realize dynamic retrieval. However, existing online hashing methods are required to update the whole database with the latest hash functions when a query arrives, which leads to low retrieval efficiency with the continuous increase of the stream data. On the other hand, these methods ignore the supervision relationship among the examples, especially in the multi-label case. In this paper, we propose a novel Fast Online Hashing (FOH) method which only updates the binary codes of a small part of the database. To be specific, we first build a query pool in which the nearest neighbors of each central point are recorded. When a new query arrives, only the binary codes of the corresponding potential neighbors are updated. In addition, we create a similarity matrix which takes the multi-label supervision information into account and bring in the multi-label projection loss to further preserve the similarity among the multi-label data. The experimental results on two common benchmarks show that the proposed FOH can achieve dramatic superiority on query time up to 6.28 seconds less than state-of-the-art baselines with competitive retrieval accuracy.
translated by 谷歌翻译